AITopics | rare event data

Collaborating Authors

rare event data

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Nonuniform Negative Sampling and Log Odds Correction with Rare Events Data

Neural Information Processing SystemsDec-24-2025, 16:01:49 GMT

We investigate the issue of parameter estimation with nonuniform negative sampling for imbalanced data. We first prove that, with imbalanced data, the available information about unknown parameters is only tied to the relatively small number of positive instances, which justifies the usage of negative sampling. However, if the negative instances are subsampled to the same level of the positive cases, there is information loss. To maintain more information, we derive the asymptotic distribution of a general inverse probability weighted (IPW) estimator and obtain the optimal sampling probability that minimizes its variance. To further improve the estimation efficiency over the IPW method, we propose a likelihood-based estimator by correcting log odds for the sampled data and prove that the improved estimator has the smallest asymptotic variance among a large class of estimators. It is also more robust to pilot misspecification. We validate our approach on simulated data as well as a real click-through rate dataset with more than 0.3 trillion instances, collected over a period of a month. Both theoretical and empirical results demonstrate the effectiveness of our method.

name change, nonuniform negative sampling, sampling and log odds correction, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Nonuniform Negative Sampling and Log Odds Correction with Rare Events Data

Neural Information Processing SystemsAug-16-2025, 14:21:16 GMT

We investigate the issue of parameter estimation with nonuniform negative sampling for imbalanced data.

artificial intelligence, estimator, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Connecticut (0.04)
Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)

Genre: Research Report > New Finding (0.95)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Nonuniform Negative Sampling and Log Odds Correction with Rare Events Data

Neural Information Processing SystemsJan-18-2025, 11:47:09 GMT

nonuniform negative sampling, rare event data, sampling and log odds correction, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A comprehensive survey on rare event prediction

AIHubNov-22-2023, 10:05:58 GMT

Rare events are infrequent incidents characterized by scarcity, often presenting computational challenges in data analysis. These events don't happen often, as the name suggests, but they significantly impact when they do. For example, in pulp-and-paper manufacturing, paper breakage that occurs 1%, can cost $10,000/hour. Predicting such elusive occurrences is important in cost management, operational efficiency, and energy conservation. In fact, these rare events are hidden pieces that, when discovered and understood, can lead to better decision-making and more efficient models.

dataset, prediction, rare event prediction, (13 more...)

AIHub

Genre:

Research Report (0.71)
Overview (0.51)

Industry: Materials > Paper & Forest Products (0.55)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Logistic Regression for Massive Data with Rare Events

Wang, HaiYing

arXiv.org Machine LearningMay-31-2020

This paper studies binary logistic regression for rare events data, or imbalanced data, where the number of events (observations in one class, often called cases) is significantly smaller than the number of nonevents (observations in the other class, often called controls). We first derive the asymptotic distribution of the maximum likelihood estimator (MLE) of the unknown parameter, which shows that the asymptotic variance convergences to zero in a rate of the inverse of the number of the events instead of the inverse of the full data sample size. This indicates that the available information in rare events data is at the scale of the number of events instead of the full data sample size. Furthermore, we prove that under-sampling a small proportion of the nonevents, the resulting under-sampled estimator may have identical asymptotic distribution to the full data MLE. This demonstrates the advantage of under-sampling nonevents for rare events data, because this procedure may significantly reduce the computation and/or data collection costs. Another common practice in analyzing rare events data is to over-sample (replicate) the events, which has a higher computational cost. We show that this procedure may even result in efficiency loss in terms of parameter estimation.

artificial intelligence, estimator, machine learning, (17 more...)

arXiv.org Machine Learning

2006.00683

Country:

North America > United States > Connecticut (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Research Report > New Finding (0.87)
Research Report > Experimental Study (0.55)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.71)

Add feedback

Logistic Regressions and Rare Events

#artificialintelligenceOct-4-2019, 18:04:41 GMT

I previously worked on designing some problem sets for a PhD class. One of the assignments dealt with a simple classification problem using data that I took from a kaggle challenge trying to predict fraudulent credit card transactions. The goal of the problem is to predict the probability that a specific credit card transaction is fraudulent. One unforeseen issue with the data was that the unconditional probability that a single credit card transaction is fraudulent is very small. This type of data is known as rare events data, and is common in many areas such as disease detection, conflict prediction and, of course, fraud detection.

artificial intelligence, likelihood, machine learning, (15 more...)

#artificialintelligence

Genre:

Research Report > New Finding (0.42)
Research Report > Experimental Study (0.42)

Industry: Banking & Finance > Credit (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.53)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.40)

Add feedback

Optimal Subsampling for Large Sample Logistic Regression

Wang, HaiYing, Zhu, Rong, Ma, Ping

arXiv.org Machine LearningMar-7-2018

For massive data, the family of subsampling algorithms is popular to downsize the data volume and reduce computational burden. Existing studies focus on approximating the ordinary least squares estimate in linear regression, where statistical leverage scores are often used to define subsampling probabilities. In this paper, we propose fast subsampling algorithms to efficiently approximate the maximum likelihood estimate in logistic regression. We first establish consistency and asymptotic normality of the estimator from a general subsampling algorithm, and then derive optimal subsampling probabilities that minimize the asymptotic mean squared error of the resultant estimator. An alternative minimization criterion is also proposed to further reduce the computational cost. The optimal subsampling probabilities depend on the full data estimate, so we develop a two-step algorithm to approximate the optimal subsampling procedure. This algorithm is computationally efficient and has a significant reduction in computing time compared to the full data approach. Consistency and asymptotic normality of the estimator from a two-step algorithm are also established. Synthetic and real data sets are used to evaluate the practical performance of the proposed method.

artificial intelligence, machine learning, mle, (18 more...)

arXiv.org Machine Learning

1702.01166

Country:

North America > United States > Georgia > Clarke County > Athens (0.14)
North America > United States > Connecticut > Tolland County > Storrs (0.14)
Europe > Austria > Vienna (0.14)
(4 more...)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)

Add feedback